AITopics | class imbalance

Collaborating Authors

class imbalance

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

FedReLa: Imbalanced Federated Learning via Re-Labeling

Hu, Guangzheng, Menéndez, Patricia, Liu, Feng, Gong, Mingming, Wang, Guanghui, Peng, Liuhua

arXiv.org Machine LearningJun-25-2026

Federated learning has emerged as the foremost approach for decentralized model training with privacy preservation. The global class imbalance and cross-client data heterogeneity naturally coexist, and the mismatch between local and global imbalances exacerbates the performance degradation of the aggregated model. The agnosticism of global class distribution poses significant challenges for data-level methods, especially under extreme conditions with severe class absence across clients. In this paper, we propose FedReLa, a novel data-level approach that tackles the coexistence of data heterogeneity and class imbalance in federated learning. By re-labeling samples with a feature-dependent label re-allocator, FedReLa corrects biased global decision boundaries without requiring knowledge of the global class distribution. This modular, model-agnostic approach can be integrated with algorithmic methods to deliver consistent improvements without additional communication overhead. Through extensive experiments, our method significantly improves the accuracy of minority classes and the overall accuracy on stepwise-imbalanced and long-tailed datasets, outperforming the previous state of the art.

artificial intelligence, fedrela, machine learning, (13 more...)

arXiv.org Machine Learning

2606.26037

Country:

North America > Canada > Ontario (0.28)
Oceania > Australia (0.28)
Asia (0.28)

Genre: Research Report (1.00)

Industry: Education (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.36)

Add feedback

Geometric Imbalance in Semi-Supervised Node Classification

Neural Information Processing SystemsJun-22-2026, 17:34:18 GMT

Class imbalance in graph data presents a significant challenge for effective node classification, particularly in semi-supervised scenarios. In this work, we formally introduce the concept of geometric imbalance, which captures how message passing on class-imbalanced graphs leads to geometric ambiguity among minority-class nodes in the riemannian manifold embedding space. We provide a rigorous theoretical analysis of geometric imbalance on the riemannian manifold and propose a unified framework that explicitly mitigates it through pseudo-label alignment, node reordering, and ambiguity filtering. Extensive experiments on diverse benchmarks show that our approach consistently outperforms existing methods, especially under severe class imbalance. Our findings offer new theoretical insights and practical tools for robust semi-supervised node classification.

data mining, machine learning, node, (21 more...)

Neural Information Processing Systems

Country: Asia > China (0.27)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Information Technology (0.46)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
(4 more...)

Add feedback

CLIMB: Class-imbalanced Learning Benchmark on Tabular Data

Neural Information Processing SystemsJun-16-2026, 22:51:07 GMT

Class-imbalanced learning (CIL) on tabular data is important in many realworld applications where the minority class holds the critical but rare outcomes.

data mining, machine learning, natural language, (22 more...)

Neural Information Processing Systems

Country: North America > United States (1.00)

Genre:

Overview (0.92)
Research Report > Experimental Study (0.67)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Banking & Finance (1.00)
Health & Medicine > Diagnostic Medicine (0.92)
(2 more...)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Data Science > Data Quality (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(5 more...)

Add feedback

Dual Prototype-Enhanced Contrastive Framework for Class-Imbalanced Graph Domain Adaptation

Neural Information Processing SystemsJun-16-2026, 22:12:59 GMT

Graph transfer learning, especially in unsupervised domain adaptation, aims to transfer knowledge from a label-abundant source graph to an unlabeled target graph. However, most existing approaches overlook the common issue of label imbalance in the source domain, typically assuming a balanced label distribution that rarely holds in practice. Moreover, they face challenges arising from biased knowledge in the source graph and substantial domain distribution shifts. To remedy the above challenges, we propose a dual-branch prototype-enhanced contrastive framework for graph domain adaptation under a class-imbalanced scenario. Specifically, we introduce a dual-branch graph encoder to capture both local and global information, generating class-specific prototypes from a distilled anchor set. Then, a prototypeenhanced contrastive learning framework is introduced. On the one hand, we encourage class alignment between the two branches based on constructed prototypes to alleviate the bias introduced by class imbalance. On the other hand, we infer the pseudo-labels for the target domain and align sample pairs across domains that share similar semantics to reduce domain discrepancies. Experimental results show that our ImGDA outperforms the state-of-the-art methods across multiple datasets and settings.

artificial intelligence, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)

Add feedback

Fourier Clouds: Fast Bias Correction for Imbalanced Semi-Supervised Learning

Neural Information Processing SystemsJun-13-2026, 13:41:31 GMT

Pseudo-label-based Semi-Supervised Learning (SSL) often suffers from classifier bias, particularly under class imbalance, as inaccurate pseudo-labels tend to exacerbate existing biases towards majority classes. Existing methods, such as \textit{CDMAD}\cite{cdmad}, utilize simplistic reference inputs--typically uniform or blank-colored images--to estimate and correct this bias. However, such simplistic references fundamentally ignore realistic statistical information inherent to real datasets, specifically typical color distributions, texture details, and frequency characteristics. This lack of \emph{statistical representativeness} can lead the model to inaccurately estimate its inherent bias, limiting the effectiveness of bias correction, particularly under severe class imbalance or substantial distribution mismatches between labeled and unlabeled datasets. To overcome these limitations, we introduce the \textbf{FARAD} (Fourier-Adapted Reference for Accurate Debiasing) System.

artificial intelligence, machine learning, proceedings, (8 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Self-Perturbed Anomaly-Aware Graph Dynamics for Multivariate Time-Series Anomaly Detection

Neural Information Processing SystemsJun-12-2026, 22:02:36 GMT

Detecting anomalies in multivariate time-series data is an essential task across various domains, yet there are unresolved challenges such as (1) severe class imbalance between normal and anomalous data due to rare anomaly availability in the real world; (2) limited adaptability of the static graph-based methods to dynamically changing inter-variable correlations; and (3) neglect of subtle anomalies due to overfitting to normal patterns in reconstruction-based methods. To tackle these issues, we propose Self-Perturbed Anomaly-Aware Graph Dynamics (SPAGD), a framework for time-series anomaly detection. SPAGD employs a self-perturbation module that generates self-perturbed time series from the reconstruction process of normal ones, which provide auxiliary signals to alleviate class imbalance during training. Concurrently, an anomaly-aware graph construction module is proposed to dynamically adjust the graph structure by leveraging the reconstruction residuals of self-perturbed time series, thereby emphasizing the inter-variable disruptions induced by anomalous candidates. A unified spatio-temporal anomaly detection module then integrates both spatial and temporal convolutions to train a classifier that distinguishes normal time series from the auxiliary self-perturbed samples. Extensive experiments across multiple benchmark datasets demonstrate the effectiveness of SPAGD compared to state-of-the-art baselines.

artificial intelligence, data mining, machine learning, (8 more...)

Neural Information Processing Systems

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Improved Balanced Classification with Theoretically Grounded Loss Functions

Neural Information Processing SystemsJun-12-2026, 17:47:00 GMT

The *balanced loss* is a widely adopted objective for multi-class classification under class imbalance. By assigning equal importance to all classes, regardless of their frequency, it promotes fairness and ensures that minority classes are not overlooked. However, directly minimizing the balanced classification loss is typically intractable, which makes the design of effective surrogate losses a central question. This paper introduces and studies two advanced surrogate loss families: Generalized Logit-Adjusted (GLA) loss functions and Generalized Class-Aware weighted (GCA) losses. GLA losses generalize Logit-Adjusted losses, which shift logits based on class priors, to the broader general cross-entropy loss family. GCA loss functions extend the standard class-weighted losses, which scale losses inversely by class frequency, by incorporating class-dependent confidence margins and extending them to the general cross-entropy family.

artificial intelligence, machine learning, proceedings, (12 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.77)

Add feedback

Simultaneous Long-tailed Recognition and Multi-modal Fusion for Highly Imbalanced Multi-modal Data

Yoon, Heegeon, Kim, Heeyoung

arXiv.org Machine LearningMay-12-2026

As datasets continue to expand in size and complexity, these models have become increasingly sophisticated, with deeper architectures and greater expressive power. Despite these advances, DNNs trained on imbalanced class distributions often exhibit a tendency to favor majority classes, leading to degraded performance on underrepresented classes [18, 39, 27, 17]. Because many real-world datasets follow long-tailed distributions in which minority classes can contain critical and informative patterns, developing methods that enable DNNs to learn effectively from imbalanced data is essential to prevent the loss of valuable information from these rare classes [26, 34, 16]. Moreover, data encountered in real-world applications are frequently multi-modal, meaning that observations originate from heterogeneous sources [6, 29, 7, 35]. To make effective use of such heterogeneous inputs, a wide range of multi-modal learning approaches have been proposed that exploit complementary information across modalities to enhance predictive performance [10, 5]. Common strategies integrate multiple modalities into a unified representation, using techniques that span from straightforward feature-level concatenation [19, 11, 12] to more sophisticated neural architectures that learn joint representations in an end-to-end manner [20, 32]. Although prior research has extensively studied class imbalance and multi-modal data separately, relatively little attentionhas beengiven to settings where bothchallenges arise si2 multaneously. Developing methods that can effectively handle long-tailed class distributions in conjunction with multi-modal inputs is therefore essential in many real-world applications. In the medical domain, for instance, datasets often contain far more samples from healthy individuals than from patients with specific conditions, while also encompassing diverse datatypes such asimagingdata(e.g., X-rays)alongsideauxiliary informationincluding demographics and clinical histories.

artificial intelligence, deep learning, machine learning, (16 more...)

arXiv.org Machine Learning

2605.10498

Genre: Research Report (0.82)

Industry: Health & Medicine (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Multimodal Deep Generative Model for Semi-Supervised Learning under Class Imbalance

Yoon, Heegeon, Kim, Heeyoung

arXiv.org Machine LearningMay-8-2026

When modeling class-imbalanced data, it is crucial to address the imbalance, as models trained on such data tend to be biased towards the majority classes. This problem is amplified under partial supervision, where pseudo-labels for unlabeled data are predicted based on imbalanced labeled data, propagating the bias. While recent semi-supervised models address class imbalance, they typically assume single-modal input data. However, with the growing availability of multimodal data, it is essential to leverage complementary modalities. In this article, we propose a multimodal deep generative model for semi-supervised learning under class imbalance. Our approach uses separate encoders for each modality, sharing latent variables across modalities, and simplifies joint posterior computation with a product-of-experts method. To further address class imbalance, we replace typical Gaussian distributions with Student's t-distributions for the prior, encoder, and decoder, better capturing the heavy-tailed latent distributions in imbalanced data. We derive a new objective function for training the proposed model on both labeled and unlabeled data using $γ$-power divergence. Empirical results on benchmark and real-world datasets demonstrate that our model outperforms baseline methods in generalization, achieving superior classification performance for partially labeled multimodal data with imbalanced class distributions.

artificial intelligence, deep learning, machine learning, (18 more...)

arXiv.org Machine Learning

2605.06289

Country: North America > United States (0.28)

Genre: Research Report (1.00)

Industry: Health & Medicine > Diagnostic Medicine > Imaging (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.61)

Add feedback

Generalized Data Weighting via Class-level Gradient Manipulation

Neural Information Processing SystemsApr-26-2026, 13:16:57 GMT

Label noise and class imbalance are two major issues coexisting in real-world datasets. To alleviate the two issues, state-of-the-art methods reweight each instance by leveraging a small amount of clean and unbiased data. Yet, these methods overlook class-level information within each instance, which can be further utilized to improve performance. To this end, in this paper, we propose Generalized Data Weighting (GDW) to simultaneously mitigate label noise and class imbalance by manipulating gradients at the class level. To be specific, GDW unrolls the loss gradient to class-level gradients by the chain rule and reweights the flow of each gradient separately.

artificial intelligence, gradient flow, machine learning, (9 more...)

Neural Information Processing Systems

Country: North America > United States > California (0.28)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.70)
Information Technology > Communications > Networks > Sensor Networks (0.46)

Add feedback